1 Motivation

The data science technology the International Consortium of Investigative Journalists (ICIJ) used to explore the papers appears to be powered by Linkurious (which appears to be a graph visualization and analysis software) . The ICIJ website did showcase some of the network visualization on its webpage.

Hence, right at the start, My aim was to explore both the panama-paradise papers with network packages available in R. This notebook should serve as a exploratory of panama-paradise papers, as well as documentating/introducing network package in R using (igraph)[http://kateto.net/networks-r-igraph] as a core. Also, given the amount of nodes and Edges, I choose to explore only interactive network plots.




2 Library

run require R packages.

#general
require(purrr)
require(tidyverse)
require(data.table)
require(lubridate)
require(stringr)
require(ggvis)
require(ggplot2)
require(forcats)
require(ggmap)
require(highcharter)
require(broom)
require(plotly)
require(stringi)

#network plot
require(igraph)
require(ggmap)
require(sna)
require(intergraph)
require(ggnetwork)
require('visNetwork')

require(viridis)

# achieve/appendices
require(GGally)
require(networkD3)



3 Data Input

csv files to be read.

Entities <- as.data.table(read.csv(file="../input/Entities.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))

Addresses <- as.data.table(read.csv(file="../input/Addresses.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))

Intermediaries <- as.data.table(read.csv(file="../input/Intermediaries.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))

Officers <- as.data.table(read.csv(file="../input/Officers.csv",na.strings=c("","NA"),stringsAsFactors = FALSE))

Edges <- as.data.table(read.csv(file="../input/all_edges.csv",na.strings=c("","NA"), stringsAsFactors = FALSE))
data_list<- c("incorporation_date","inactivation_date","struck_off_date","dorm_date")
Entities[,(data_list):=lapply(.SD,parse_date_time,orders="%d-%m-%Y"),
         .SDcols=data_list]



4 Glossary of terms - Network

  • Links/Edges - A data frame object with the links between the nodes. It should include the Source (from) and Target (to) for each link, as well as other properties (depending on package being use). Sadly, the provided datasets contain no flow of finance/money, just relationship between identities.

  • Nodes/Vertices - A data frame containing the node id and properties of the nodes. (Address, Enties, Intermediaries, Officers in this case)

The .csv given are also rather well structured to be adapted into network packages - Entities, Addresses, Intermediaries, and Officers datasets llisting are all attributed to a “node_id”, while the “node_1” and “node_2” column in Edge.csv seems to be describing the relationship between said nodes.


Nodes/Edges Enchancement

Some functions of igraph that would aid in interpretating network plots, to be used as color or node size/edges width etc.

  • Centrality - igenvector centrality scores correspond to the values of the first eigenvector of the graph adjacency matrix; these scores may, in turn, be interpreted as arising from a reciprocal process in which the centrality of each actor is proportional to the sum of the centralities of those actors to whom he or she is connected.

  • Betweenness - The vertex and edge betweenness are (roughly) defined by the number of geodesics (shortest paths) going through a vertex or an edge.

  • Degree - The degree of a vertex is its most basic structural property, the number of its adjacent edges

  • Community/Clusters - This function tries to find densely connected subgraphs, also called communities in a graph via random walks. The idea is that short random walks tend to stay in the same community.



5 Data Exploration

The very first thing I realise is that it is impossible to relistically plot ~1.5million edges and ~900k nodes in a single plot. Even if the hardware/software could somehow support it, it would not be comprehensible to human. This issue should be clearer after seeing some networks plots later.

Thus, given that one can only shown limited edges and nodes in a network plots. It seems reducing the respective node_id to respective unique instance of countries would work (both nodes and edges would require processing).

Alternatively, one could cluster the nodes, then proceed to subplot the network. This is done in later section, although this method did not occur to me until I am done with country based nodes and started examining the clusters.


5.1 Country Network

5.1.1 Nodes

Binded all the identities in the nodes (Entities, Intermediaries, Officers). I excluded address data sets from this nodes as all the identities are already tied to their respective geographical countries.

## Nodes
# Combining various identities and label them
Nodes<-rbind(Entities[,.(node_id,countries, country_codes, "Entities")], 
      Intermediaries[,.(node_id,countries, country_codes, "Intermediaries")], 
      Officers[,.(node_id,countries, country_codes, "Officers")])
setnames(Nodes, "V4", "Identity") # data.table method to rename
# colnames(Nodes)[4]<- "Identity"

Some of the node id’s country names and codes are listed as “Not Identify”, While some others are simply left blank.

Instance where,

  • Country = NA - is changed to “Unknown” for mapping and aggregating purpose. perhaps these are not yet being work not. (as oppsed to not identified within the document).

  • Country = “Not Identify” - is left alone

Nodes<-Nodes[is.na(countries), ':='(countries= "Unknown", country_codes = "XXX")]

5.1.2 Mapping a node_id map to country id.

For single listing countries name

## Records listed for single Country
IndividualCountry_Nodes<-Nodes[!grep(";",countries)] %>% # for single country listing
  # creating id column that is unique to per country
  .[,id:=.GRP, by= countries] 

# Creating a unique Mapping of Country to ID
Country2ID_Map<-IndividualCountry_Nodes[,.(id,countries)]%>%
  unique(., by = c("countries","id"))

#Number of countries
IndividualCountry.Agg<-Nodes[!grep(";",countries),] %>%
  .[,.N,by=c("countries", "country_codes", "Identity")] %>%
  .[order(-N)] %>%
  .[, if(sum(N)> 5000) .SD, by=c("countries")] # filtering for only countries with more than 5k  listings

# plot
hchart(IndividualCountry.Agg, "column", hcaes(x = countries, y = N, group = Identity))%>%
hc_title(text = "Popular single countries listing for nodes id",
             style = list(color = "Black", useHTML = TRUE))

For cross/multiple countries listing in column name

The country column for those with muliple country name listed seperate instance of a country. For example,

  • “British Virgin Islands;Hong Kong”,

  • “Hong Kong;British Virgin Islands”

This would introduce an extra 569 nodes (due to various combination of countries), hence should be eliminated/merged away, not to metion these listing also begs the questions of where to list them on a physical world map.

# For records listing multiple countries, most of them are Entities.
# data.frame(table(IndividualCountry_Nodes$Identity))

## CrossCountry Nodes, which listed multiple countries seperated with ";", 
CrossCountry_Nodes<-Nodes[grep(";",countries)] 

# For records listing multiple countries, most of them are Entities.
# data.frame(table(CrossCountry_Nodes$Identity))

#Number of countries
CrossCountry_Nodes.Agg<-CrossCountry_Nodes %>%
  .[,.N,by=c("countries", "country_codes", "Identity")] %>%
  .[order(-N)]%>%
  head(30)

# plot
hchart(CrossCountry_Nodes.Agg, "column", hcaes(x = countries, y = N, group = Identity))%>%
hc_title(text = "Popular cross/multiple countries listing for nodes id",
             style = list(color = "Black", useHTML = TRUE))

# At first I thought about ignoring these, but then, these might hold valueble information regarding links, given the links between one entities/intermediates and another.

This should be combined for the count. Reodering of the list country terms after a strsplit to counteract this.

## "British Virgin Islands;Hong Kong" is listed as seperated count as 
# "Hong Kong;British Virgin Islands", Hence, need to combine them in same counts


### This section split the strings in countries column, which is then reordered and combined back into its original column
## helper function for vapply()
striHelper <- function(x) stri_c(x[stri_order(x)], collapse = ";")

CrossCountry_Nodes$countries<-vapply(strsplit(CrossCountry_Nodes$countries,  ";"), striHelper, ";")
CrossCountry_Nodes$country_codes<-vapply(strsplit(CrossCountry_Nodes$country_codes,  ";"), striHelper, ";")

# Raw Number Aggregation, can also be use to check the reordering of strings
CrossCountryOccurance<-CrossCountry_Nodes %>%
  .[,.N, by = c("countries", "country_codes")] %>%
  .[order(-N)]

I decided on taking only the 1st column after reodering to simplify the tracking of countries, this would introduce some bias into the data due to the ordering of countries names by alphabetical order.

In essence, this would convert

  • British Virgin Islands;Hong Kong > British Virgin Islands

  • Hong Kong;British Virgin Islands > British Virgin Islands

# #Spliting the multiple countries listed
t.splits <- max(lengths(strsplit(CrossCountry_Nodes[,countries], ";")))
# 
# t.test <- CrossCountry_Nodes[,.(countries,country_codes)] %>%
#   .[, paste0("m.countries",1:t.splits):=tstrsplit(countries,";")] %>%
#   melt(.,  measure.vars = patterns("^m.*"), na.rm = T) %>%
#   .[,.N, by=c("value","countries")] %>%
#   .[order(-N)] %>%
#   .[, if(sum(N)> 500) .SD, by=c("countries")] # filtering for only countries with more than 500 listings
# 
# # plot
# hchart(t.test, "column", hcaes(x = value, y = N, group = countries))

CrossCountry_Nodes<-CrossCountry_Nodes%>%
  .[, paste0("m.countries",1:t.splits):=tstrsplit(countries,";")] %>%
  # this would merge the country uniqiue id on "m.countries1" column, hence introduce slight bias into the data
  # Perhaps a double merge approach might be better? such that both of the listed countries are each melted into a entry
  # It would be messy though.
  .[Country2ID_Map, on=c(m.countries1 = "countries"), nomatch= 0]

5.1.2.1 Country2ID Mapping

Combining both the indiviual and cross listing country’s node_id to country_id

## And Thus we finnaly have our node_id to country id ready
Bind_Country2ID_Map<-rbindlist(
  list(
  CrossCountry_Nodes[,.(node_id, m.countries1, Identity, id)],
  IndividualCountry_Nodes[,.(node_id, countries, Identity, id)]
  )
)



5.1.3 Edges - Countries

Pulling only “node_id” from Egdes as attributes

##Edges
Edges_simplified<-Edges[,.(node_1, node_2)]
# Edges_simplified[complete.cases(Edges_simplified)]

Steps taken:

  • Applying the previously constructed Country2ID Map

  • Aggregate the relationship/connection between countries, summing each incidence (set as weight).[Simple graph]

#merging data table, edges and nodes
Country_id_Edges<-Edges_simplified %>%
  .[Bind_Country2ID_Map, on=c(node_1 = "node_id"), nomatch= 0] %>%
  .[Bind_Country2ID_Map, on=c(node_2 = "node_id"), nomatch= 0] %>%
  .[,.(id,i.id)]%>% #the "ID" is derived from country ID from node_1, the second - "I.ID" is derived from node_2
  .[, .N, by=c("id","i.id")]

colnames(Country_id_Edges)<- c("from", "to", "weight")



5.1.3.1 Geocoding the Nodes

Pin pointing the respective country nodes on map with geocoding

geocodes_df<-structure(list(lon = c(114.109497, 120.960515, 104.195397, 8.227512, 
103.819836, -51.92528, -80.782127, -172.104629, 9.555373, -3.74922, 
100.992541, -74.297333, -2.13125, 1.521801, 55.491977, -7.6920536, 
4.469936, 34.851612, -5.353585, -2.585278, 53.847818, 33.429859, 
-66.58973, -4.548056, 35.862285, 9.501785, -55.765835, 36.238414, 
-77.39628, -3.435973, -79.6735841, -88.49765, 6.129583, -90.230759, 
10.451526, 57.552152, 35.243322, -95.712891, 7.4246158, 25.0136071, 
-169.867233, -78.183406, 15.472962, 5.291266, 19.5033041, -83.753428, 
-8.224454, -81.2546, 22.937506, 14.375416, 174.885971, -5.54708, 
-70.162651, 2.213749, 12.56738, 105.318756, -106.346771, 45.079162, 
21.824312, 51.183884, -102.552784, -75.015152, -64.7505, 133.775136, 
-62.782998, -63.616672, -88.89653, 14.550072, -170.132217, -58.443832, 
-71.542969, 30.802498, 18.643501, 138.252924, 14.995463, -60.978893, 
113.921327, -86.241905, -61.287228, 47.481766, -72.285215, 29.154857, 
30.217636, -85.207229, -71.797928, -63.068615, 37.906193, 127.766922, 
24.96676, 8.468946, -64.639968, 121.774017, 23.881275, 108.277199, 
24.603189, -69.968338, 25.48583, 31.1655799, 101.975766, 25.7481511, 
53.688046, 19.145136, 28.233608, 8.675277, 35.529562, 113.543873, 
-82.9000751, -63.588653, 48.516388, -68.99002, 78.96288, 21.745275, 
-61.796428, -7.09262, -14.452362, -61.370976, 18.49041, 24.684866, 
-9.429499, -159.777671, 38.996815, -63.05483, 27.953389, 42.590275, 
-59.543198, 9.537499, 50.5577, 166.931503, 47.576927, 69.345116, 
64.585262, -61.222503, 80.771797, -64.896335, -77.781167, 34.888822, 
-66.590149, 34.301525, 18.732207, 144.793731, 55.975413, 15.2000001, 
-77.297508, -3.996166, 19.37439, 66.923684, -19.020835, 166.959158, 
28.369885, 17.873887, 90.356331, 17.228331, 114.727669, -1.023194, 
12.354722, 20.168331, 27.849332, 19.699024, -97.1108285, 171.184478, 
32.290275, 20.939444, -149.406843, 167.954712, 143.95555, 84.124008, 
178.065032, -61.679, 95.955974, 21.005859, 102.495496, 73.22068, 
145.6739, 104.990963, 103.846656, -1.561593, -23.0418, 43.679291, 
21.758664, 1.659626, 46.869107, -11.779889, 29.873888, 46.199616, 
45.038189, 31.465866, 59.556278, -61.024174, 71.276093, 11.609444, 
-63.0500809, -15.310139, 0.824782, 2.315834, -9.696645, -56.027783, 
17.679076, 8.081666, 40.489673, -58.93018, 35.233154, -15.180413, 
160.156194, 10.267895, 55.536384, 12.457777, 74.766098, -175.198242, 
90.433601), lat = c(22.396428, 23.69781, 35.86166, 46.818188, 
1.352083, -14.235004, 8.537981, -13.759029, 47.166, 40.463667, 
15.870032, 4.570868, 49.214439, 42.506285, -4.679574, 53.1423672, 
50.503887, 31.046051, 36.140751, 49.465691, 23.424076, 35.126413, 
6.42375, 54.236107, 33.854721, 56.26392, -32.522779, 30.585164, 
25.03428, 55.378051, 41.6262707, 17.189877, 49.815273, 15.783471, 
51.165691, -20.348404, 38.963745, 37.09024, 43.7384176, 58.595272, 
-19.054445, -1.831239, 49.817492, 52.132633, 47.162494, 9.748917, 
39.399872, 19.3133, -30.559482, 35.937496, -40.900557, 7.539989, 
18.735693, 46.227638, 41.87194, 61.52401, 56.130366, 23.885942, 
39.074208, 25.354826, 23.634501, -9.189967, 32.3078, -25.274398, 
17.357822, -38.416097, 13.794185, 47.516231, -14.270972, -23.442503, 
-35.675147, 26.820553, 60.128161, 36.204824, 46.151241, 13.909444, 
-0.789275, 15.199999, 12.984305, 29.31166, 18.971187, -19.015438, 
12.862807, 12.865416, 21.694025, 18.220554, -0.023559, 35.907757, 
45.943161, 60.472024, 18.420695, 12.879721, 55.169438, 14.058324, 
56.879635, 12.52111, 42.733883, 48.379433, 4.210484, 61.92411, 
32.427908, 51.919438, -29.609988, 9.081999, -18.665695, 22.198745, 
32.1656221, -16.290154, 15.552727, 12.16957, 20.593684, 41.608635, 
17.060816, 31.791702, 14.497401, 15.414999, -22.95764, -22.328474, 
6.428055, -21.236736, 34.802075, 18.04248, 53.709807, 11.825138, 
13.193887, 33.886917, 26.0667, -0.522778, 40.143105, 30.375321, 
41.377491, 10.691803, 7.873054, 18.335765, 21.521757, -6.369028, 
18.220833, -13.254308, 15.454166, 13.444304, 21.4735329, 45.1, 
18.109581, 17.570692, 42.708678, 48.019573, 64.963051, -15.376706, 
47.411631, -11.202692, 23.684994, 26.3351, 4.535277, 7.946527, 
7.369722, 41.153332, -13.133897, 48.669026, 49.8860835, 7.131474, 
1.373333, 6.611111, -17.679742, -29.040835, -6.314993, 28.394857, 
-17.713371, 12.1165, 21.916221, 44.016521, 19.85627, 3.202778, 
15.0979, 12.565679, 46.862496, 12.238333, 16.5388, 33.223191, 
-4.038333, 28.033886, -18.766947, 8.460555, -1.940278, 5.152149, 
40.069099, -26.522503, 38.969719, 14.641528, 38.861034, -0.803689, 
18.0708298, 13.443182, 8.619543, 9.30769, 9.945587, 3.919305, 
43.915886, 17.607789, 9.145, 4.860416, 31.952162, 11.803749, 
-9.64571, 1.650801, -21.115141, 43.94236, 41.20438, -21.178986, 
27.514162)), .Names = c("lon", "lat"), class = "data.frame", row.names = c(NA, 
-209L))
 # with ggmap version 2.6 and geocoding withing a key, it is possible for one to ran into OVER QUERY LIMIT with just a couple geocode (as the quote for query is shared). 
# Hence, to get it working perfectly, currenctly, one has to install ggmap v2.7 ( through github only atm), and register a google key

# devtools::install_github("dkahle/ggmap")
# install.packages("geosphere")

## To get a API key from google API
# https://developers.google.com/maps/documentation/geocoding/get-api-key
# https://stackoverflow.com/questions/36175529/getting-over-query-limit-after-one-request-with-geocode
# register_google(key = "insert key here")
# 
# 
# filelist <- list.files("../input")
# 
# if(any(filelist=="geocodes_df.rds")){
#   #read the created .rds containing the require data
#   geocodes_df <- readRDS("../input/geocodes_df.rds")
# }else{
#   # using geocodes ( part of ggmap package) to find the lat and lon 
#   # perhaps not the cleanest way, some of the location will not be the most accurate.
#   geocodes_df <- geocode(Country2ID_Map$countries)
#   saveRDS(geocodes_df, "../input/geocodes_df.rds")
# }

# bind extracting coordinates into the Nodes, to be use as attributes for plotting later
CountryIDNodes<-cbind(Country2ID_Map,geocodes_df)



5.1.3.2 Edges’s Weight statistic

Given the number of Edges, and that we are probably more interested in links that are most significant, perhaps the edgeshould be filtered by weight before plotting into network graph

summary(Country_id_Edges$weight)
 # Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 # 1.0      2.0      5.0    283.9     21.0   173200.0 
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##      1.0      2.0      5.0    283.9     21.0 173200.0

It can be seen that the majority of the connections/edges between countries nodes are rather week, even the 3rd Quantile of the weight is only mere 21, while it’s mean is 283.9, Ie a very right skewed distribution with very very long tail.




5.1.4 Plotly - igraph/ggnetwork method

This method of achieve interactive network plots roughly followed what shown at (minimaxir)[http://minimaxir.com/2016/12/interactive-network/]

Pros

  • ggplot style of plotting, although most nodes attributes require some work/preprocess to be depicted nicey

Issue

  • Directionality in Edges is not shown in plotly - Thus, directional edges are shown only as lines between nodes.

  • No self loop/arrow can be showned - This is significant in this case, as self loop can be indicative of considerable amoung of illicit relationship/money movement within a country.

  • Failed get the geom edges’s weight to be depict as width/size after ggplotly(), the

To reduce the amount of Edges plotted, the weight can be thresholded to include only those > ~ Mean value. This threshold can also then be applied to remove countries nodes that did not shown consideration edges/relationship in the extracted datasets.

In this case, I assume that the relationship is direction, from “node_1” to “node_2”

net <- graph.data.frame(Country_id_Edges[weight>=285, ], 
                        CountryIDNodes[id %in%
                                           sort(unique(
                                             c(
                                               Country_id_Edges[weight>=285]$from, 
                                               Country_id_Edges[weight>=285]$to)
                                           ))], 
                        directed = TRUE)
#igraph, creating the graph entities while filtering for weight
Nodes_betweenness<- igraph::betweenness(net)

#### Nodes Enchancement 
V(net)$degree <- igraph::degree(net, mode = "all")
V(net)$betweenness <-log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))
V(net)$centrality <- eigen_centrality(net, weights=E(net)$Weight)$vector
V(net)$community <- colorize(V(net)$community)
V(net)$text <- V(net)$countries
V(net)$color <- colorize(V(net)$degree)

#### Edge Enhancement
#Need to manually alocate the Edge lat,lon to appropriate coordinates
end_loc <- data.table(ename=as.integer(get.edgelist(net)[,2]))
end_loc<- CountryIDNodes[end_loc, on= c(id="ename"), nomatch= 0]

start_loc <- data.table(ename=as.integer(get.edgelist(net)[,1]))
start_loc<- CountryIDNodes[start_loc, on= c(id="ename"), nomatch= 0]


### Setting coordinates of edges arrow
E(net)$endlat <- end_loc$lat
E(net)$endlon <- end_loc$lon

E(net)$startlat <- start_loc$lat
E(net)$startlon <- start_loc$lon


### Scaling of weight
# applying a logarithm scale to recale the weight from 0 to 1
E(net)$weight<-log(1+E(net)$weight)/log(1+max(E(net)$weight))


5.1.4.1 Country Network plot on Map

Customised variables on plotly

  • Nodes Size - centrality, the centrality of each country is proportional to the sum of the other country nodes that it is connected with. This parameter would be influence by the weight of the edges connected and self looping edges too. (Larger ~ more central to the network)

  • Nodes colour - Degree, how many connection one nodes has (lighter colour means that a particular country nodes has more connections) [counting both in and out connections]

df_net <- ggnetwork(net, layout = "kamadakawai", weights="weight")
# the ggnetwork essentially convert the igraph structure 'net' into a dataframe, which is more easy and famlier to work with, but this is also very limiting. 

plot <- ggplot(arrow.gap = 0.025) +
    borders("world",
           colour ="black", fill="#7f7f7f", size=0.10, alpha=1/2)+
  geom_edges(data = df_net,aes(x = lon, y = lat, xend = endlon, yend = endlat),
             size = 0.4, alpha=0.25 ,  #size parameter in geom edge  is not passed over correctly into ggplotly, it seems to be carry over to borders(country) in plotly too
             arrow = arrow(length = unit(10, "pt"), type = "closed")) +
  geom_nodes(data=df_net,aes(x=lon, y=lat, xend=endlon,yend=endlat, 
                             size=centrality, colour=sqrt(degree), text=text)) +
    scale_colour_viridis() +
  ggtitle("Relationship of Countries with various nodes") + 
  ## geom_map would provide a nicer map, but proved to be problematic for ggplotly
  # geom_map(data=world, map=world, aes(x=long, y=lat, map_id=region),
  #          color="white", fill="#7f7f7f", size=0.05, alpha=1/4) +

  guides(size=FALSE, color=FALSE) +
  theme_blank()+
  # https://github.com/ropensci/plotly/issues/842
  theme(legend.position='none') #translate to hide legend in plotly

plot %>% ggplotly(tooltip="text") 
#%>% toWebGL()
#issue, arrow head doesn't get translated into plotly via ggplotly
# no self loop is shown

In the exposed network trails, it seems British Virgin Island is among the most connected country nodes, followed by Unknown (identities that did not list a country) and Bahama. Other major country nodes with higher degree(colour) and centrality(size) is Hong Kong, Singapore, UAE, Cyprus, Switzerland, not identify and UK.


The highest centrality and degreeness of the most popular countries of network relations can also be ploted.

by_countries.raw<-setDT(df_net)[is.na(endlat),
              .(countries, betweenness,centrality,degree)][order(-centrality)]%>%
  .[,name:=as.character(countries)] %>%
  head(30)

x<- c("Country", "Betweenness", "Centrality","Degree")
y<- c("{point.countries}", (sprintf("{point.%s:.2f}", c("betweenness", "centrality","degree"))))
tltip<- tooltip_table(x,y)

hchart(by_countries.raw, "scatter", hcaes(centrality, betweenness, size= degree, color=degree), dataLabels=list(enabled=T, format = '{point.countries}'))%>%
  hc_title(text="Network Attributes of Country Nodes in panama-paradise papers")%>%
  hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)
  #   hc_yAxis(type="logarithmic")%>%
  # hc_xAxis(type="logarithmic")

It can be seen the most central to the network are Bahamas, British Vrigin island and Hong Kong.




#   
# df_net <- ggnetwork(net, layout = "fruchtermanreingold", weights="weight", niter=50000, arrow.gap=0)
#  # layout = "kamadakawai"
# # arrow.gap = 0.025 # 
# # arrow gap default value for directed graph, but the arrows aren't carried over in plottly
# # niter -  This argument controls the number of iterations to be employed. Larger values take longer, but will provide a more refined layout. (Defaults to 500.)
# 
# plot <- ggplot() +
#   geom_edges(data = df_net,aes(x = x, y = y, xend = xend, yend = yend),
#              size=0.4, alpha=0.25) +
#   geom_nodes(data = df_net,aes(x = x, y = y, xend = xend, yend = yend, 
#                                size = degree, color = degree, text=text)) +
#   ggtitle("Relationship of Countries with various nodes") + 
#   scale_colour_viridis() +
#   ## geom_map would provide a nicer map, but proved to be problematic when chaining through ggplotly
#   # geom_map(data=world, map=world, aes(x=long, y=lat, map_id=region),
#   #          color="white", fill="#7f7f7f", size=0.05, alpha=1/4) +
#   # scale_color_manual(labels=c("EWR", "JFK", "LGA", "Others"),
#   #                      values=c(colors, "#1a1a1a"), name="Airports") +
#   guides(size=FALSE, color=FALSE) +
#   theme_blank()+
#   # https://github.com/ropensci/plotly/issues/842
#   theme(legend.position='none') #translate to hide legend in plotly
# 
# 
# 
# #raw plot
# plot
# 
# #plotlly plot
# plot %>% ggplotly(tooltip="text") 

5.1.5 Highcharter - igraph

The network can also be plotted in a network layout governed by certain algorithm, where the properties of various nodes and edges are better depicted. In this case, I choose to use Kamada-Kawai layout algorithm and highcharter.

#igraph, creating the graph entities while filtering for weight
Nodes_betweenness<- igraph::betweenness(net)
membership<- membership(cluster_walktrap(net))

#### Nodes Enchancement 
V(net)$degree<-igraph::degree(net, mode = "all")
V(net)$betweenness<-log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))
V(net)$centrality<-eigen_centrality(net, weights=E(net)$Weight)$vector
V(net)$text<-V(net)$countries
V(net)$color<-colorize(membership)
V(net)$size<-V(net)$degree

#### Edge Enhancement
#Need to manually alocate the Edge lat,lon to appropriate coordinates
end_loc<-data.table(ename=as.integer(get.edgelist(net)[,2])) %>%
  .[CountryIDNodes, on= c(ename="id"), nomatch= 0]

### Setting coordinates of edges arrow
E(net)$endlat<-end_loc$lat
E(net)$endlon<-end_loc$lon

### Scaling of weight
# applying a logarithm scale to recale the weight from 0 to 1
E(net)$weight<-log(1+E(net)$weight)/log(1+max(E(net)$weight))
E(net)$width<-E(net)$weight*3

#Doesn't appearst to be working
# E(net)$arrow.size<- 12

hchart(net, layout=layout_with_kk)%>%
  hc_title(text="Network Attributes of Country Nodes in panama-paradise papers")

### couldn't get the nodes to be fix on respective coordinate of the countries.
# hchart(net, layout=as.matrix(geocodes_df))
# Error in UseMethod("layout") : no applicable method for 'layout' applied to an object of class "igraph"



5.1.6 Visnetwork

Well documented on their Website

#thresholding
vis_edge<-Country_id_Edges[weight>=285,]
vis_node<-CountryIDNodes[id %in% sort(unique(
                                             c(
                                               Country_id_Edges[weight>=285]$from, 
                                               Country_id_Edges[weight>=285]$to)
                                           ))]

# using igraph to calculate some betweenness and degree
net<-graph.data.frame(vis_edge, vis_node, directed = TRUE)
    
Nodes_betweenness<-igraph::betweenness(net) # Node size
Nodes_Degree<-igraph::degree(net, mode = "all")
  
## Enchancement
## ?visNodes
vis_node$shape <- "dot"
vis_node$shadow <- TRUE # Nodes will drop shadow
vis_node$label <-vis_node$countries
vis_node$title <- vis_node$countries
vis_node$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 25 #default to 25
vis_node$borderWidth <- 2 # Node border width
vis_node$color.background <- colorize(Nodes_Degree)
vis_node$color.border <- "black"
vis_node$color.highlight.background <- "orange"
vis_node$color.highlight.border <- "darkred"

## Defining starting position of nodes as coordinates of the countries, so that their location of on graph would bear some semblance to their respective location on the map ( ie, Australia is down south etc)
vis_node$x<- vis_node$lon+180
vis_node$y<- -vis_node$lat+90

## Physics can be disable so the nodes would not be moved from the initial location (lat/lon), this is not used as it generated a plot that is rather hard to read.
# vis_node$physics<- F
# vis_edge$physics<- T

# ?visEdges
vis_edge$shadow <- FALSE    # edge shadow
vis_edge$width <-log(1+vis_edge$weight)/log(1+max(vis_edge$weight)) # default to 1
vis_edge$arrows <- "middle" # arrows: 'from', 'to', or 'middle'

set.seed(1)
visNetwork(edges=vis_edge, nodes=vis_node, main="Aggregated Network plot of Countries Nodes Network")%>%
  visOptions(highlightNearest = TRUE) 

## While the Initial zoom level can be setup, this require either to disable visPhysics's Stabilization or the use of visIgraphLayout, which would sacrifice the the cleanliness of the plot

## Choosing to true off stabilization option in physics would hence require the stabilization iteration to be plotted, aesthetically and physically impressive but not useful 

# visEvents(type = "once", startStabilizing = "function() {
#             this.moveTo({scale:0.5})}") %>%
#   visPhysics(stabilization = FALSE)%>% 

# %>% visIgraphLayout() 
## While it yield a ok map with the Igraph Layout, it is relatively messy as the nodes and edges can be in close proximity with one another.

You will have to scroll your mouse3 to zoom towards the network plots, unfortunately setting initiall zoom level brought about some undesirable side effects, so I’m disabling them for now.

Country coordinates(lat, lon) of respective nodes are used as starting location of the network plot. Hence, the final location of nodes ( countries) should bear some resemblance to their respective location on the world map.(Australia should always be south etc)

YOu can click on individual nodes to highlight itself and its 1st adjacent neigbhours, to explore the major flows of network for selected countries.




5.2 Graphing the Network between nodes id

5.2.1 Nodes

## Nodes
#merging them all on node_id doesn't seems to result in a very useful plot, the codes for merging that I tried running can be found in the appendices section. 

# Combining various identities and label them
Nodes<-rbind(
  Entities[,.(node_id,countries, country_codes, nameID=name, sourceID, Identity="Entities")], 
  Intermediaries[,.(node_id,countries, country_codes, nameID=name, sourceID, Identity="Intermediaries")], 
  Officers[,.(node_id,countries, country_codes, nameID=name, sourceID, Identity="Officers")],
  Addresses[,.(node_id, countries, country_codes, sourceID, Identity="Addresses")]
  , fill=TRUE)
#I initially thought that address wouldn't be needed in to full network diagram, but later found out that if I exlude the addresses datasets, I couldn't form a network graph some of the nodes require connection to the node_id that can only be found in address datasets.

*Removing duplicate listing of some nodes which share id but exist is different identitites.

# These combined dataframe of nodes is not directly network graphable. As the node_id is not unique, ie. Below we explore these non unique node_id records.
Non_unique_ID <-Nodes[, fD := .N > 1, by = node_id][fD==TRUE] %>%
  .[order(node_id)]
# Non_unique_ID %>%head(30)

# So, apparently some ID have entires for both Intermediaries and Officers, which probably a simply row_bind to combine them, as in these case m the node_id would not be unique.

# after some testing, it appears that such issue only occurs between intermediaries and officers.

# Dropping the officers row if the node_id is already occupied by an intermediate.
Nodes<-Nodes[!(fD==TRUE & Identity=="Officers")]

# Dropping the fD column as it is no longer needed.
Nodes$fD <- NULL

5.2.2 Edges

Simpilfying the Edges to Make it possible to visuallise it with line type

# While I intend to use different arrows type for the disply of Edges witin the network plot, there are simply far too many relationship types as indicated by the rel_type column in Edges. Although the majority of the relationship are well covered by the top 30 types

# Hence, I will simplify it by defining 3 type of edges, 
# 1) Identical relationship (only within top 30 types)#same name as
# 2) Directional relationship (only within top 30 types) #intermediary of/shareholder of/director of
# 3) Others (those not inlcuded in top 30 most popular relationship)


popular_rel_type<-Edges[,.N, by=c("rel_type")] %>%
  .[order(-N)] %>%
  head(30)

# within the top 30 most common relationship

identical_relation_list <- c("similar name and address as",
                        "same name as",
                        "same company as",
                        "same name and registration date as",
                        "same address as")

Edges[rel_type %in% popular_rel_type$rel_type, Edge_Type:=1]%>%
  .[!(rel_type %in% popular_rel_type$rel_type), Edge_Type:=2]%>%
  .[rel_type %in%identical_relation_list, Edge_Type :=3]

popular_rel_type<-Edges[,.N, by=c("rel_type", "Edge_Type")] %>%
  .[order(-N)] %>%
  head(10)

# plot
hchart(popular_rel_type, "column", hcaes(x = rel_type, y = N, group = Edge_Type))%>%
hc_title(text = "Top 10 Nodes relationship between the nodes ID",
             style = list(color = "Black", useHTML = TRUE))
##Edges
Edges_simplified<-Edges[,.(node_1, node_2, rel_type, Edge_Type, sourceID)]
colnames(Edges_simplified) <-c("from", "to", "rel_type", "edge_type", "sourceID")
## Setting network graph into directed to examine the all connections and out connections of nodes
net <- graph.data.frame(Edges_simplified, vertices=Nodes, directed = T)

### Degree, the connections of edges
# nodes_degree_all <- igraph::degree(net, mode = "all")
# nodes_degree_out <- igraph::degree(net, mode = "out")
# The degree of a vertex is its most basic structural property, the number of its adjacent edges.

### Betweenness, number of shortest path going through vertext, 
### It doesn't seems sensible to examine the network plot with this
# nodes_betweenness<- igraph::betweenness(net)
## The vertex and edge betweenness are (roughly) defined by the number of geodesics (shortest paths) going through a vertex or an edge.

# nodes_centrality <- eigen_centrality(net)
## Eigenvector centrality scores correspond to the values of the first eigenvector of the graph adjacency matrix; these scores may, in turn, be interpreted as arising from a reciprocal process in which the centrality of each actor is proportional to the sum of the centralities of those actors to whom he or she is connected.

## allocating the calculated nodes attributes into a dataframe
nodes_attributes<-data.table(
  nodes_id=names(igraph::degree(net, mode = "all")), 
  nodes_degree_all=(igraph::degree(net, mode = "all")), 
  nodes_degree_out=(igraph::degree(net, mode = "out")), 
  nodes_betweenness=(igraph::betweenness(net)),
  centrality=(eigen_centrality(net)$vector))

## Clustering This would decompose the available node and edges, forming seperated communities if they are not connected.

decomposed_graph_list<-decompose.graph(net)
# this return a list of seperate graph for each component
# plot(decomposed_graph_list[[231]])


##Calculation the number of members per decomposed graph and set it as a dataframe.
vcount_dt<-data.table(unlist(lapply(decomposed_graph_list,vcount)),keep.rownames=T)
vcount_dt$cluster_id<-rownames(vcount_dt)
setnames(vcount_dt, "V1", "vcount")
popular_DG_list<-vcount_dt[order(-vcount)]%>%head(10)



# plot
hchart(popular_DG_list, "column", hcaes(x= cluster_id, y = vcount))%>%
hc_title(text = "Number of Nodes in clusters",
             style = list(color = "Black", useHTML = TRUE))%>%
    hc_yAxis(type="logarithmic")
  


## Choosing clusters of different size to plot
#large id=991, N=406
#medium id=185, N=166
#small  id=5050, N=16

# plot(decomposed_graph_list[[1]])

Among 942172 members/nodes are already connected within a community/network(cluster id -1). Given the number of plots (~ 1M nodes and its associated edges), it would be impossible to outright plot the full network in R. I can see why ICIJ teamed up with Linkurious to explore the network.

Though, can explore the network map with its nodes properties, say, acquiring the nodes that were most central to the network or nodes that is most well connected.

5.2.3 Nodes Properties

5.2.3.1 Centrality

This would score its nodes based on its eigen centrality, ie, it is proportional to the sum of the centralities of those actors that a particuar nodes is connected to.

# # Exploring centrality
# High_Centrality_Nodes<-nodes_attributes[centrality>=0.002681][order(-centrality)]%>%head(30)
# # Changing the class of nodes_id to interget for Merging
# High_Centrality_Nodes$node_id<- as.integer(High_Centrality_Nodes$nodes_id)
# # Merging with original nodes to acquire nodes attributes
# H_Centrality_dt<-Nodes[High_Centrality_Nodes, on=c(node_id="node_id")]

nodes_attributes$nodes_id<- as.integer(nodes_attributes$nodes_id)

# Exploring centrality
H_Centrality_dt<-nodes_attributes[order(-centrality)] %>%
  Nodes[., on=c(node_id="nodes_id")]%>%
  .[!(Identity=="Addresses"),]%>%
  head(15)

H_Centrality_dt[,.(nameID, sourceID, Identity, node_id, countries)]

# 
# ##Highcharter
# #tooltip table
# x<- c("NodeName", "Degree all","Degree out","Betweenness", "Centrality")
# y<- c("{point.nameID}", (sprintf("{point.%s:.2f}",
#                                  c("nodes_degree_all", "nodes_degree_out",
#                                    "nodes_betweenness", "centrality"))))
# tltip<- tooltip_table(x,y)
# 
# #plot
# hchart(H_Centrality_dt, "scatter", hcaes(centrality, nodes_betweenness,
#                                          size= nodes_degree_all, color=nodes_degree_out), dataLabels=list(enabled=T, format = '{point.nameID}'))%>%
#   hc_title(text="Top 30 highest centrality Nodes in panama-paradise papers")%>%
#   hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)%>%
#   hc_yAxis(type="logarithmic")%>%
#   hc_xAxis(type="logarithmic")


5.2.3.2 Degree

Alternatively, we can identify a list of nodes which is most well connected (ie being popular in edge list )

H_Degree_dt<-nodes_attributes[order(-nodes_degree_all)] %>%
  Nodes[., on=c(node_id="nodes_id")]%>%
  .[!(Identity=="Addresses"),]%>%
  head(15)

H_Degree_dt[,.(nameID, nodes_degree_all, sourceID, Identity, node_id, countries)]


# #Exploring degrees
# # nodes with most in connections
# Most_in_connected<-nodes_attributes[order(nodes_degree_out, -nodes_degree_all)]%>%head(30)
# Nodes[node_id %in% Most_in_connected$nodes_id]
# 
# 
# # nodes with most outgoing connections
# Most_out_connected<-nodes_attributes[order(-nodes_degree_out, nodes_degree_all)]%>%head(30)
# Nodes[node_id %in% Most_out_connected$nodes_id]



5.2.4 Sub comunnities

We can, however, adequate plot selected sub-communities (given selected clustering technique), or focus on a single identified nodes and its neighbors.

Below are plots of some issolated communities.

subnodes_large<-data.table(as_data_frame(decomposed_graph_list[[991]], what = c("vertices")))
subedges_large<-data.table(as_data_frame(decomposed_graph_list[[991]], what = c("edges")))

subnodes_medium<-as_data_frame(decomposed_graph_list[[185]], what = c("vertices"))
subedges_medium<-as_data_frame(decomposed_graph_list[[185]], what = c("edges"))

subnodes_small<-as_data_frame(decomposed_graph_list[[5050]], what = c("vertices"))
subedges_small<-as_data_frame(decomposed_graph_list[[5050]], what = c("edges"))

Subnodes_Large

# using igraph to calculate some betweenness and degree
subnet_large<-graph.data.frame(subedges_large, subnodes_large, directed = TRUE)
    
Nodes_betweenness<-igraph::betweenness(subnet_large) # Node size
Nodes_Degree<-igraph::degree(subnet_large, mode = "all")
  
# Enchancement
# ?visNodes
subnodes_large$id<- subnodes_large$name
subnodes_large$shadow <- TRUE # Nodes will drop shadow
subnodes_large$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 25 #default to 25
subnodes_large$borderWidth <- 2 # Node border width
subnodes_large$color.background <- colorize(Nodes_Degree)
subnodes_large$color.border <- "black"
subnodes_large$color.highlight.background <- "orange"
subnodes_large$color.highlight.border <- "darkred"
subnodes_large$shape <- factor(subnodes_large$Identity,
                                levels=c("Entities","Intermediaries","Officers","Addresses"),
                                labels=c("dot","triangle","square","diamond"))
subnodes_large$label <-subnodes_large$nameID
subnodes_large$title <- paste0("<p>",subnodes_large$nameID,"<br>",subnodes_large$countries,"</p>")

# ?visEdges
subedges_large$shadow <- FALSE    # edge shadow
subedges_large$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
subedges_large$dashes <- (subedges_large$edge_type==3)
subedges_large$label<- subedges_large$rel_type

set.seed(1)
visNetwork(edges=subedges_large, nodes=subnodes_large, main="Extracted Large Cluster")  %>% 
           # height="400px", width="100%")  %>% 
  visIgraphLayout() %>%
  visOptions(highlightNearest = list(enabled=T, degree=1, hover=F)) %>% 
  visNodes(scaling = list(min = 10, max = 50))



Subnodes_medium

# using igraph to calculate some betweenness and degree
subnet_medium<-graph.data.frame(subedges_medium, subnodes_medium, directed = TRUE)
    
Nodes_betweenness<-igraph::betweenness(subnet_medium) # Node size
Nodes_Degree<-igraph::degree(subnet_medium, mode = "all")
  
# Enchancement
# ?visNodes
subnodes_medium$id<- subnodes_medium$name
subnodes_medium$shadow <- TRUE # Nodes will drop shadow
subnodes_medium$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 10 #default to 25
subnodes_medium$borderWidth <- 2 # Node border width
subnodes_medium$color.background <- colorize(Nodes_Degree)
subnodes_medium$color.border <- "black"
subnodes_medium$color.highlight.background <- "orange"
subnodes_medium$color.highlight.border <- "darkred"
subnodes_medium$shape <- factor(subnodes_medium$Identity,
                                levels=c("Entities","Intermediaries","Officers","Addresses"),
                                labels=c("dot","triangle","square","diamond"))
subnodes_medium$label <-subnodes_medium$nameID
subnodes_medium$title <- paste0("<p>",subnodes_medium$nameID,"<br>",subnodes_medium$countries,"</p>")


# ?visEdges
subedges_medium$shadow <- FALSE    # edge shadow
subedges_medium$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
subedges_medium$dashes <- (subedges_medium$edge_type==3)
subedges_medium$label<- subedges_medium$rel_type

set.seed(1)
visNetwork(edges=subedges_medium, nodes=subnodes_medium, main="Extracted Medium Cluster")  %>% 
  # visIgraphLayout() %>%
  visOptions(highlightNearest = TRUE)



Subnodes_Small

# using igraph to calculate some betweenness and degree
subnet_small<-graph.data.frame(subedges_small, subnodes_small, directed = TRUE)
    
Nodes_betweenness<-igraph::betweenness(subnet_small) # Node size
Nodes_Degree<-igraph::degree(subnet_small, mode = "all")
  
# Enchancement
# ?visNodes
subnodes_small$id<- subnodes_small$name
subnodes_small$shadow <- TRUE # Nodes will drop shadow
subnodes_small$label <-subnodes_small$countries
subnodes_small$title <- subnodes_small$nameID
subnodes_small$size <- log(10+Nodes_betweenness)/log(1+max(Nodes_betweenness))* 10 #default to 25
subnodes_small$borderWidth <- 2 # Node border width
subnodes_small$color.background <- colorize(Nodes_Degree)
subnodes_small$color.border <- "black"
subnodes_small$color.highlight.background <- "orange"
subnodes_small$color.highlight.border <- "darkred"
subnodes_small$shape <- factor(subnodes_small$Identity,
                                levels=c("Entities","Intermediaries","Officers","Addresses"),
                                labels=c("dot","triangle","square","diamond"))
subnodes_small$label <-subnodes_small$nameID
subnodes_small$title <- paste0("<p>",subnodes_small$nameID,"<br>",subnodes_small$countries,"</p>")

# ?visEdges
subedges_small$shadow <- FALSE    # edge shadow
subedges_small$arrows <- "middle" # arrows: 'from', 'to', or 'middle'
subedges_small$dashes <- (subedges_small$edge_type==3)
subedges_small$label<- subedges_small$rel_type

set.seed(1)
visNetwork(edges=subedges_small, nodes=subnodes_small, main="Extracted Small Cluster")  %>% 
  # visIgraphLayout() %>%
  visOptions(highlightNearest = TRUE)



6 Apendicies

Dump/Achieved code chunks than I tried and failed to achieve adequate result.

6.1 Merging all the datasets together

This is achieved version my previous attempt to merge(only with node_id) all the datasets while preserving the individual columns by renaming them before merging.

Ultimately, this seems to yield a very unweidly data.frame/data.table (very sparse). Turns out most of the node_id only have one identiy(either entities/intermediaries/officers), There are some node_id with inputs for intermediaries and officers but it is rather rare.

rbind method that i used in my main routine works better in this case.

########
# 
# ## rename the individual datasets, to combine them via node_id
# colnames(Entities) <- paste("Ent", colnames(Entities), sep = ".")
# colnames(Intermediaries) <- paste("Int", colnames(Intermediaries), sep = ".")
# colnames(Officers) <- paste("Off", colnames(Officers), sep = ".")
# 
# ## Merging all the inputs together.
# testx<-Intermediaries[Entities, on= c(Int.node_id="Ent.node_id" )]%>%
#   .[Officers, on = c(Int.node_id="Off.node_id")]
#
#  
########

6.1.1 Exploring the inner membership within clusters

Subnodes_ELarge

### Degree, the connections of edges
subnodes_attributes<-data.table(
  nodes_id = as.integer(names(igraph::degree(decomposed_graph_list[[1]], mode = "all"))),
  nodes_degree_all = igraph::degree(decomposed_graph_list[[1]], mode = "all"),
  nodes_degree_out = igraph::degree(decomposed_graph_list[[1]], mode = "out"),
  nodes_betweenness= igraph::betweenness(decomposed_graph_list[[1]]),
  nodes_centrality = eigen_centrality(decomposed_graph_list[[1]])$vector)

# # Exploring centrality
# H_Degree_dt<-subnodes_attributes[order(-nodes_degree_all)] %>%
#   Nodes[., on=c(node_id="nodes_id")]%>%
#   .[!(Identity=="Addresses"),]%>%
#   head(10)
# 
# ##Highcharter
# #tooltip table
# x<- c("NodeName", "Degree all","Degree out","Betweenness", "Centrality")
# y<- c("{point.nameID}", (sprintf("{point.%s:.2f}",
#                                  c("nodes_degree_all", "nodes_degree_out",
#                                    "nodes_betweenness", "nodes_centrality"))))
# tltip<- tooltip_table(x,y)
# 
# #plot
# hchart(H_Degree_dt, "scatter", hcaes(nodes_degree_all, nodes_degree_out, 
#                                          size= nodes_centrality, color=nodes_betweenness), dataLabels=list(enabled=T, format = '{point.nameID}'))%>%
#   hc_title(text="Network Attributes of Nodes in panama-paradise papers")%>%
#   hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)
# Exploring centrality
H_Centrality_dt<-subnodes_attributes[order(-nodes_centrality)] %>%
  Nodes[., on=c(node_id="nodes_id")]%>%
  .[!(Identity=="Addresses"),]%>%
  head(15)

# #plot
# hchart(H_Centrality_dt, "scatter", hcaes(nodes_centrality, nodes_betweenness, 
#                                          size= nodes_degree_all, color=nodes_degree_out), dataLabels=list(enabled=T, format = '{point.nameID}'))%>%
#   hc_title(text="Network Attributes of Nodes in panama-paradise papers")%>%
#   hc_tooltip(useHTML=TRUE, headerFormat="",pointFormat=tltip)%>%
#   hc_xAxis(type = "logarithmic")%>%
#   hc_yAxis(type = "logarithmic")

6.1.1.1 Clustering with label propagation

# subnodes_Exlarge<-data.table(as_data_frame(decomposed_graph_list[[1]], what = c("vertices")))
# subedges_Exlarge<-data.table(as_data_frame(decomposed_graph_list[[1]], what = c("edges")))
# subnet_Elarge_net<-graph.data.frame(subedges_Exlarge, subnodes_Exlarge, directed = F)
# 

# # subnodes_cluster_lec<-cluster_leading_eigen(subnet_Elarge_net)
# set.seed(1)
# subnodes_cluster_prop<-cluster_label_prop(subnet_Elarge_net)
# 
# # print(subnodes_cluster_prop)
# # modularity(subnodes_cluster_prop)
# # length(subnodes_cluster_prop)
# # # membership(subnodes_cluster_prop)
# # sizes(subnodes_cluster_prop)
# 
# 
# ## Extracting the nodes and its communities by membership in a cluster
# ex<-induced.subgraph(subnet_Elarge_net, which(membership(subnodes_cluster_prop)==35))